Skip to content

Conversation

@lingzhq
Copy link
Collaborator

@lingzhq lingzhq commented Jul 14, 2025

Description

This PR adds a standalone utility to address the need for comparing Trinity's RFT experiments, which require multiple runs due to their stochastic nature. The script parses TensorBoard logs from repeated experiments, aggregates the results, and plots them with confidence intervals.

Here is a sample plot generated by the script, showing the evaluation performance on the MATH500 benchmark for Qwen2.5-1.5B that utilize GRPO on the GSM8K and MATH datasets respectively:

example_pic

Note: The script requires matplotlib package.

Example Usage:

In current version, this script functions as a standalone utility. Users need to manually specify the paths and configurations for each experiment in a YAML file. To generate the plots, run the following command:

python scripts/multi_exps_plot/multi_exps_plot.py --config scripts/multi_exps_plot/plot_configs.yaml

[TODO] Automate the process of running repeated experiments and generating comparison plots.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@lingzhq lingzhq requested a review from yxdyc July 14, 2025 13:07
@pan-x-c pan-x-c merged commit 675ff5b into agentscope-ai:main Jul 15, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants